DrillBit

Detection Pipeline

How it works

Every submission passes through a five-stage analytical pipeline before an AI probability score is assigned.

1 Extraction

Text is extracted, tokenised, and language-verified. Formatting artefacts and metadata noise are stripped before analysis begins.

2 Feature Analysis

Perplexity, burstiness, stylometric profile, n-gram density, and semantic coherence are measured across the full document.

3 Classification

An ensemble of neural networks and gradient-boosted models processes all extracted features simultaneously.

4 Scoring

A continuous 0–100% AI probability score is generated. No forced binary snap-judgements — evidence determines the score.

5 Review

Scores are presented to institutional reviewers. Uncertain cases in the 20–60% zone are flagged for human adjudication.

What DrillBit looks for

Characteristics of AI-generated text

Large language models leave consistent statistical fingerprints in the text they produce. DrillBit's engine is trained to detect all of these signals simultaneously.

∼

Low perplexity

AI text is statistically predictable — each word follows high-probability patterns learned from training data. Human writing contains surprising, lower-probability word choices that deviate naturally from model expectations.

≋

Low burstiness

Humans write in bursts — short punchy sentences punctuating longer analytical ones. AI-generated text displays unnaturally uniform sentence lengths, producing a flat rhythmic signature detectable by length-variance analysis.

♦

Stylometric uniformity

AI output lacks the idiosyncratic punctuation habits, vocabulary preferences, and syntactic quirks that characterise individual human authors. Stylometric profiling detects this absence of personal authorial voice.

≡

Semantic overcoherence

Paragraphs produced by LLMs exhibit unnaturally smooth topic transitions and an absence of the digressive, self-correcting flow typical of genuine human reasoning and academic argumentation.

⊙

N-gram pattern density

AI models reuse common phrase-level constructions across documents. High-frequency n-gram matching against a trained reference corpus reveals these repeated structural and lexical patterns.

◎

Lexical richness flatness

Type-token ratio and lexical diversity measures tend to fall within a narrower band in AI text than in human writing, which varies considerably based on vocabulary breadth, register shifts, and individual expression.

Detection Coverage

AI platforms DrillBit detects

Validated across the three dominant AI writing platforms used in academic contexts, with ongoing updates as new models are released.

Platform	Models covered	Content characteristics	Detection status
ChatGPT OpenAI	GPT-3.5, GPT-4, GPT-4o	Fluent, structured academic prose; consistent formal register across disciplines; strong paragraph organisation.	Fully supported
Gemini Google DeepMind	Gemini 1.0, 1.5 Pro	Information-dense output; varied register; strong technical vocabulary; tendency toward structured enumeration.	Fully supported
Grok xAI	Grok-1, Grok-1.5	Conversational-to-formal range; distinct syntactic patterns; variable formality across prompt types.	Fully supported
Paraphrased AI Any platform	Any model with manual or automated paraphrasing applied post-generation.	AI-generated text with surface-level edits intended to mask origin signals. Burstiness and stylometric markers often remain detectable.	Partial — improving
Mixed authorship Any platform	AI-assisted drafting interspersed with human-written passages.	Hybrid documents where AI and human sections alternate. Represents an emerging authorship pattern in academic submissions.	Partial — improving

Score Interpretation Guide

What does an AI score mean?

DrillBit assigns every document a continuous AI probability score from 0 to 100%. Use this interactive guide to understand exactly what any score means and what action is appropriate.

AI Score 45%

0% ← Human zone Uncertain zone AI zone → 100%

Classification

—

Score zone

—

Recommended action

—

Validated Performance

Accuracy you can cite

DrillBit's detection accuracy was validated in a large-scale study across 2.5 million document samples — one of the largest empirical AI detection evaluations published to date.

93%

Human Detection Accuracy

True Negative Rate across 1,000,000 human-authored samples

83%

AI Detection Accuracy

True Positive Rate across 1,000,000 AI-generated samples

88%

Overall System Accuracy

(TP + TN) / Total = 176 / 200

Dataset: 1,000,000 human samples (research papers pre-2018, student submissions pre-2015) + 1,000,000 AI samples (ChatGPT, Gemini, Grok — minimum 500 words each) + 500,000 mixed samples. Covering 8 academic disciplines: science & technology, general medicine, anatomy, social science, literature, basic science, robotics, and edge computing. Full methodology disclosed in the DrillBit AI Detection White Paper.

Common questions

Frequently asked

Can a student be penalised based solely on the AI score? +

No. DrillBit's AI scores are designed as indicators for human review, not automated enforcement tools. Any institutional action must involve qualified human assessment of the submission in its full context. The score is one data point — not a verdict. DrillBit strongly recommends that institutions establish clear AI use policies that specify how scores are reviewed and what evidentiary standard is applied before any disciplinary process is initiated.

Why is there an uncertain zone between 20% and 60%? +

This range represents genuine classification ambiguity — content that exhibits a mixture of human and AI linguistic characteristics. Rather than forcing a binary result where the evidence is weak, DrillBit surfaces the score transparently and flags these cases for human review. This design minimises false accusations while maintaining strong detection at the clearly AI or clearly human extremes. Documents in this range may reflect post-edited AI content, mixed authorship, or structured human writing styles that partially overlap with AI output patterns.

What if a student writes in a very formal or structured academic style? +

Formal human writing can produce elevated AI scores, particularly in scientific and technical disciplines where conventions require precise, structured prose. This is a known challenge across all AI detection systems. DrillBit's 20% classification boundary is calibrated to tolerate structured human writing, and the validated 93% human detection accuracy confirms this. Reviewers are advised to consider writing style, prior submission history, and other contextual evidence alongside the AI score before drawing any conclusions.

Does DrillBit detect AI in languages other than English? +

The current validated evaluation covers English-language documents. Multilingual AI detection is on our active development roadmap, with Arabic, Spanish, French, Mandarin, and Hindi prioritised for the next model release cycle. Institutions with non-English submission requirements are encouraged to contact DrillBit directly to discuss rollout timelines.

How does DrillBit stay current as new AI models are released? +

DrillBit maintains a continuous retraining pipeline. When significant new AI models are released publicly, samples generated by those models are collected, labelled, and incorporated into the next training cycle. Detection performance against new models is evaluated against held-out test sets before any update is deployed to production. Institutions are notified of significant model updates through the platform release notes.

Where can I read the full accuracy evaluation methodology? +

DrillBit publishes a full white paper disclosing dataset composition (2.5 million samples across 8 disciplines), classification boundary conditions, the complete confusion matrix, and all performance metrics including sensitivity, specificity, and overall accuracy with formulas. The white paper is available for free download from the DrillBit resources page — no account required.

What is the minimum document length for reliable detection? +

DrillBit's detection engine is optimised for documents of 500 words or more — the threshold used in our validation study. Very short documents (under 200 words) may produce lower-confidence scores because the statistical signals used for classification require sufficient text to be measurable. For short submissions, scores should be interpreted with additional caution and reviewer discretion.

Brightspace

How DrillBit detects AI-generated content

Low perplexity

Low burstiness

Stylometric uniformity

Semantic overcoherence

N-gram pattern density

Lexical richness flatness

AI Score Interpreter

See DrillBit AI Detection in action

Know DrillBit

About

Why DrillBit

Partners

Compliance & Security

Case Studies / Customer Stories

Careers

On Media

Our Products

Similarity Detection

AI Content Detection

Title Discovery

Folder Management

Classroom Management

Document Quality Analysis

Solutions

Institutions

Corporates

Schools

Learning Platforms

Consortiums

Publishers

Individuals

Brightspace

How DrillBit detects AI-generated content

Low perplexity

Low burstiness

Stylometric uniformity

Semantic overcoherence

N-gram pattern density

Lexical richness flatness

AI Score Interpreter

See DrillBit AI Detection in action

A plagiarism checker that reads between the lines

Conference Registration

A plagiarism checker that reads between the lines

Request Demo

Partner Request Form

Company Information

Contact Information

Message